All Questions
Tagged with scikit-learnmachine-learning
758 questions
0votes
0answers
12views
Isolation Forest sample size
I am using sklearn's Isolation Forest as a model to detect anomalies. My dataset is relatively small, 50 records with only 2-3 features. To prevent any overfitting, what would you recommend to tune ...
4votes
1answer
54views
Unsupervised Isolation Forrest sklearn hyperparameters
I am using sklearn's IsolationForest for unsupervised anomaly detection task. According to the docs, https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html, there are ...
-1votes
0answers
37views
ML model for Career Prediction
I am NOT able to figure out how to make a ML model. I have been chatgpting most of it and understanding the code, I'm doing next to nothing. No matter what code I input, the accuracy is always 0%... ...
2votes
1answer
44views
I can't get my R² above 70%
I tried RandomForest, LGBM, Knneighbors, Polynomial Regression as algorithm's and cross-validation, train test split and standard scaler, nothing seem's to get it past the 70% mark. The dataframe has ...
1vote
1answer
45views
RFECV and grid search - what sets to use for hyperparameter tuning?
I am running machine learning models (all with sci-kit learn estimators, no neural networks) using a custom dataset with a number of features and binomial output. I first split the dataset into 0.6 (...
1vote
1answer
48views
Manual Python Implementation of Stacking Model
I tried to build a Python class, CustomStackingClassifier(), to implement the Stacking method in ensemble machine learning. In this implementation, the output of the base classifiers is set to be the ...
3votes
1answer
81views
Comparing clusterings from different datasets
I have 2 different data sets with essentially the same variables, though one is data from one year and the other is data from another year. I've run KModes on both data sets and now have some ...
2votes
2answers
142views
Random Forest always predicting the majority class
I'm predicting disease outcome using biological data (metabolites plus covariates age, sex and BMI). The outcome is a binary variable and moderately imbalanced (~12% positive cases). I have a ...
0votes
0answers
34views
Is it possible to compute Davies Bouldin score from a precomputed distance matrix using sklearn?
I'm trying to compute the Davies Bouldin score to compare different clustering approach. I have a precomputed distance matrix (that represents edit-based distance between texts). I'm using the scikit-...
0votes
1answer
79views
As an intermediate R programmer looking to dive into machine learning, should I choose Python or stick with R?
Background I am an intermediate R programmer with some experience in machine learning concepts and simple modeling in R. I have an opportunity to collaborate with a professional machine learning team ...
0votes
0answers
271views
Correct method to report Randomized Search CV results
I have searched online but I still cannot find a definitive answer on how to "correctly" report the results from hyperparameter tuning a machine learning model; though, this may just be some ...
-1votes
1answer
58views
label encoding & one hot encoding
I have read somewhere that label encoding is only used for target variable and then for the input features we can use one hot encoding (nominal ) and ordinal encoding( features having order). I am ...
0votes
0answers
11views
Implementation of multi-classification meta-estimators in scikit-learn
In scikit-learn we have different methods to deal with multi-classification problems, below are some of the meta estimators used a. OneVsRestClassifier and ...
4votes
2answers
177views
Loss function in Isolation Forest
I have recently came across on this algorithm and was working on my graduation project. As per my understanding, we creates sub trees for each sub samples. Then we calculates the scores for each ...
3votes
2answers
576views
How can I fit sklearn.svm.SVC with three features, given that the features are actually arrays of lengths 128, 12 and 40?
To clarify, each instance of feature_1 is a 128 item long array, each instance of feature_2 is a 12 item long array, and each instance of feature_3 is a 40 item long array. I am currently simply doing ...